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We report an updated measurement of the CP- violating phase, (j>i^^'^ , and the decay- width differ- 
ence for the two mass eigenstates, AFs, from the flavor-tagged decay — >■ J/i/j0. The data sample 
corresponds to an integrated luminosity of 8.0 fb~^ accumulated with the DO detector using pp col- 
lisions at ^/s = 1.96 TeV produced at the Fermilab Tevatron collider. The 68% bayesian credibility 
intervals, including systematic uncertainties, are AF^ = 0.1631q o64 ps~^ and (fii''^'^ — — O.SSlg ^g. 
The p- value for the Standard Model point is 29.8%. 

PACS numbers: 13.25.Hw, 11.30.Er 



I. INTRODUCTION 

The meson-antimeson mixing and the phenomenon 
of chargc-conjugation-parity (CP) violation in neutral 
mesons systems are key problems of particle physics. 
In the standard model (SM), the light (L) and heavy 
(H) mass eigenstates of the 5° system are expected to 
have sizeable mass and decay width differences: AMg = 
Mh — Ml and AFs = — F^. The two mass eigen- 
states are expected to be almost pure CP eigenstates. 
The CP-violating phase that appears in 6 — >■ ccs decays 
is due to the interference of the decay with and without 
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mixing, and it is predicted [l| to be = — 2/3f *^ = 

2 arg[-VtbVtl/VcbV*,] = -0.038±0.002, where Vij are ele- 
ments of the Cabibbo-Kobayashi-Maskawa quark-mixing 
matrix 0. New phenomena [3l-l23| may alter the ob- 
served phase to = -2/3, = -2/3f + 0f . A signif- 
icant deviation of (f>i^'^'^ from its small SM value would 
indicate the presence of processes beyond SM. 

The analysis of the decay chain — J/ip4>, J/ip — >■ 
Ii'^Ii~, (f) — >■ K^K~ separates the CP-even and CP-odd 
states using the angular distributions of the decay prod- 
ucts. It is a unique feature of the decay — >• J/i/'</' that 
because of the sizeable lifetime difference between the 
two mass eigenstates, there is a sensitivity to (jii^^'^ even 
in the absence of the flavor tagging information. The 
first direct constraint on cpi'^'^ [H, m was derived by 
analysing J/tp(t> decays where the flavor (i.e., 5° 

or Bg) at the time of production was not determined 
( "tagged" ) . It was followed by an improved analysis [2^ , 
based on 2.8 fb~^ of integrated luminosity, that included 
the information on the flavor at production. In ad- 
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dition, the CDF collaboration has performed a measure- 
ment [13 of (t)i''^'^ using 1.35 fb"! of data. After the 
submission of this Article, new measurements of the CP 
violation parameters in the — > J Hj4' decay have been 
published by the CDF Q and the LHCb ^ Collabo- 
rations. 

In this Article, we present new results from the time- 
dependent amplitude analysis of the decay — >• J/ijicj) 
using a data sample corresponding to an integrated lu- 
minosity of 8.0 fb"-'^ collected with the DO detector [s^l 
at the Fermilab Tcvatron Collider. In addition to the in- 
crease in the size of the data sample used in the analysis, 
we also take into account the 5-wave K~^K~ under the 6 



peak that has been suggested [31| to contribute between 
5-10%. We measure ATg; the average lifetime of the 
B° system, = l/F^, where Ts = (Th and the 



CP-violating phase 



Sectionllllbrieflv describes the 



DO detector. Section IIIII presents the event reconstruc- 
tion and the data set. Sections IIVI and |V] describe the 
event selection requirements and the procedure of deter- 
mining the flavor of the initial state of the 5° candidate. 
In Sec. lVII we describe the analysis formalism and the fit- 
ting method, present fit results, and discuss systematic 
uncertainties in the results. We obtain the bayesian cred- 
ibility intervals for physics parameters using a procedure 
based on the Markov Chain Monte Carlo (MCMC) tech- 
nique, presented in Sec. I VIII We summarize and discuss 
the results in Sec. I Villi 



II. DETECTOR 

The DO detector consists of a central tracking system, 
calorimctry system and muon detectors, as detailed in 
Refs. [30, 32, 3^. The central tracking system comprises 
a silicon microstrip tracker (SMT) and a central fiber 
tracker (CFT), both located inside a 1.9 T superconduct- 
ing solenoidal magnet. The tracking system is designed 
to optimize tracking and vertexing for pseudorapidities 
\r]\ < 3, where rj = — ln[tan(0/2)], and 6 is the polar 
angle with respect to the proton beam direction. 

The SMT can reconstruct the pp interaction vertex 
(PV) for interactions with at least three tracks with a 
precision of 40 ^m in the plane transverse to the beam di- 
rection and determine the impact parameter of any track 
relative to the PV with a precision between 20 and 50 
/im, depending on the the number of hits in the SMT. 

The muon detector is positioned outside the calorime- 
ter. It consists of a central muon system covering the 
pseudorapidity region jj^l < 1 and a forward muon sys- 
tem covering the pseudorapidity region 1 < 1 77 1 < 2. Both 
central and forward systems consist of a layer of drift 
tubes and scintillators inside 1.8 T toroidal magnets and 
two similar layers outside the toroids. 

The trigger and data acquisition systems are designed 
to accommodate the high instantaneous luminosities of 
Tevatron Run II. 



III. DATA SAMPLE AND EVENT 
RECONSTRUCTION 



The analysis presented here is based on data accumu- 
lated between February 2002 and June 2010. Events are 
collected with a mixture of single- and dimuon triggers. 
Some triggers require track displacement with respect to 
the primary vertex (large track impact parameter) . Since 
this condition biases the lifetime measurement, the 
events selected exclusively by these triggers are removed 
from our sample. 

Candidate B° J/%l)(t), J/^j fi+fi^, ^ K+R- 
events are required to include two opposite charge muons 
accompanied by two opposite charge tracks. Both muons 
are required to be detected in the muon chambers inside 
the toroid magnet, and at least one of the muons is re- 
quired to be also detected outside the toroid. Each of 
the four final-state tracks is required to have at least one 
SMT hit. 

To form B^ candidates, muon pairs in the invariant 
mass range 3.096 ± 0.350 GcV, consistent with J/i{j de- 
cay, are combined with pairs of opposite charge tracks 
(assigned the kaon mass) consistent with production at a 
common vertex, and with an invariant mass in the range 
1.019 ± 0.030 GcV. A kinematic fit under the B° de- 
cay hypothesis constrains the dimuon invariant mass to 
the world-average J/ ip mass [s^l and constrains the four- 
track system to a common vertex. 

Trajectories of the four S° decay products are ad- 
justed according to the decay-vertex kinematic fit. The 
re-adjusted track parameters are used in the calculation 
of the Bg candidate mass and decay time, and of the 
three angular variables characterising the decay as de- 
fined later. 5° candidates are required to have an invari- 
ant mass in the range 5.37 ± 0.20 GeV. In events where 
multiple candidates satisfy these requirements, we select 
the candidate with the best decay vertex fit probability. 

To reconstruct the PV, we select tracks that do not 
originate from the candidate B^ decay, and apply a con- 
straint to the average beam-spot position in the trans- 
verse plane. We define the signed decay length of a B° 
meson, L^y, as the vector pointing from the PV to the 
decay vertex, projected on the B^ transverse momentum 
Pt- The proper decay time of a i?^ candidate is given by 
t ~ Mb L^y ■p/{p'^) where Mb^ is the world-average 5° 
mass [s^ l , and p is the particle momentum. The distance 
in the beam direction between the PV and the B^ vertex 
is required to be less than 5 cm. Approximately 5 million 
events are accepted after the selection described in this 
section. 



IV. 



BACKGROUND SUPPRESSION 



The selection criteria are designed to optimimize the 
measurement of (pi^^'^ and AFs . Most of the background 
is due to directly produced J/tJj mesons accompanied by 
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tracks arising from hadronization. This "prompt" back- 
ground is distinguished from the "non-prompt" , or "in- 
clusive B J/ijj+X" background, where the J/ip meson 
is a product of a 6-hadron decay while the tracks forming 
the candidate emanate from a multi-body decay of a 6 
hadron or from hadronization. Two different event selec- 
tion approaches are used, one based on a multi-variate 
technique, and one based on simple limits on kinematic 
and event quality parameters. 



IV-A. Signal and background simulation 

Three Monte Carlo simulated samples are used to 
study background suppression: signal, prompt back- 
ground, and non-prompt background. All three are gen- 



erated with PYTHIA [35[. Hadronization is also done in 



PYTHIA, but all hadrons carrying heavy flavors are passed 
on to EvtGen [3^ to model their decays. The prompt 
background MC sample consists of J/V' — ^ decays 
produced in gg J/il-'g, gg J/'4'l; a-^^d 177 J/ipg 
processes. The signal and non-prompt background sam- 
ples are generated from primary bb pair production with 
all b hadrons being produced inclusively and the J/ip 
mesons forced into pf^ decays. For the signal sample, 
events with a iJ^ are selected, their decays to J/ipc/) are 
implemented without mixing and with uniform angular 
distributions, and the mean lifetime is set to = 
1.464 ps. There are approximately 10^ events in each 
background and the signal MC samples. All events are 
passed through a full GEANT-based [331 detector simula- 
tion. To take into account the effects of multiple inter- 
actions at high luminosity, hits from randomly triggered 
pp collisions are overlayed on the digitized hits from MC. 
These events are reconstructed with the same program as 
used for data. The three samples are corrected so that the 
Pt distributions of the final state particles in B^ — )■ J/'ip4' 
decays match those in data (see Appendix [B|) . 



the BDT output discriminant for the prompt and 
prompt cases. 
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FIG. 1: (color online). BDT discriminant output for the 
prompt (top) and non-prompt (bottom) classifiers. The sig- 
nal and background events are taken from simulation. Events 
used for BDT training are excluded from these samples. 



IV-C. Selection Criteria 



IV-B. Multivariate event selection 

To discriminate the signal from background events, we 
use the TMVA package [331 ■ preliminary studies using 
MC simulation, the Boosted Decision Tree (BDT) algo- 
rithm was found to demonstrate the best performance. 
Since prompt and non-prompt backgrounds have differ- 
ent kinematic behavior, we train two discriminants, one 
for each type of background. We use a set of 33 variables 
for the prompt background and 35 variables for the non- 
prompt background. The variables and more details of 
the BDT method are given in Appendix |X| 

The BDT training is performed using a subset of the 
MC samples, and the remaining events are used to test 
the training. The signal MC sample has about 84k 
events, the prompt background has 29k events, and the 
non-prompt background has 39k events. Figure [1] shows 



To choose the best set of criteria for the two BDT 
discriminants, we first step through the values of both 
BDT discriminants from —0.4 to 0.8 in increments of 
0.01 and measure the 5° signal yield for each choice of 
cuts. Next, we define 14 signal yield regions between 4000 
and 7000 events, and for each region choose the pair of 
BDT cuts which gives the highest significance S/y/S + B, 
where S [B) is the number of signal (background) events 
in the data sample. The 14 points, in increasing order of 
the signal size S, are shown in Table |I| Figure [D shows 
the number of signal events as a function of the total 
number of events for the 14 points. As the BDT criteria 
are loosened, the total number of events increases by a 
factor of ten, while the number of signal events increases 
by about 50%. 

As a test of possible detrimental effects of training on 
variables with low separation power, we have repeated 
the above procedure using only the variables whose im- 
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portance (see Appendix IX)) exceeds 0.01, giving 18 vari- 
ables for the prompt background and 13 variables for the 
non-prompt background. The resulting number of back- 
ground events for a given number of signal events is larger 
by about 10%. Therefore, we proceed with the original 
number of variables. 

Criteria S S + B Non-prompt Prompt 



Set 






BDT 


BDT 





4550 


38130 


0.45 


0.42 


1 


4699 


44535 


0.45 


0.29 


2 


5008 


53942 


0.39 


0.35 


3 


5213 


64044 


0.36 


0.30 


4 


5364 


72602 


0.33 


0.28 


5 


5558 


85848 


0.13 


0.41 


6 


5767 


100986 


0.21 


0.29 


7 


5988 


120206 


0.13 


0.29 


8 


6097 


134255 


0.07 


0.29 


9 


6399 


189865 


0.04 


0.10 


10 


6489 


254022 


-0.05 


-0.01 


11 


6608 


294949 


-0.13 


0.00 


12 


6594 


364563 


-0.18 


-0.14 


13 


6695 


461744 


-0.35 


-0.08 



TABLE I: Numbers of signal and signal-plus-background 
events for different sets of BDT criteria, shown in the last 



two columns, that give the largest value of S/^S + B for a 
given S. 
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FIG. 2: Number of _B° — ^ J/'4"t> signal events as a function of 
the total number of events for the 14 criteria sets considered. 



The choice of the final cut on the BDT output is based 
on an ensemble study. For each point in Tabic HI we per- 
form a maximum-likelihood fit to the event distribution 
in the 2-dimensional (2D) space of candidate mass 
and proper time. This 2D fit provides a paramctriza- 
tion of the background mass and proper time distribu- 
tion. We then generate pseudo-experiments in the 5D 
space of i?g candidate mass, proper time, and three in- 
dependent angles of decay products, using as input the 
parameters as obtained in a preliminary study, and the 
background from the 2D fit. Wc perform a 5D maximum 
likelihood fit on the ensembles and compare the distribu- 
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FIG. 3: Ensemble study results for (a) mean value of (j{4>a) as 
a function of the number of signal events and (b) mean value 
of (7(Ars) as a function of the number of signal events. 



tions of the statistical uncertainties of (fi'l^^'^ {a{(j)i'^''')) 
and ATs {a{ATs)) for the different sets of criteria. The 

dependence of the mean values of a{4>i^^'^) and (T(Ar<t) 
on the number of signal events is shown in Figs.lSJa) and 

[31Jb). The mean statistical uncertainties of both 
and Ar^, systematically decrease with increasing signal, 
favoring looser cuts. The gain in the parameter reso- 
lution is slower for the three loosest criteria, while the 
total number of events doubles from about 0.25x10^ to 
0.5 X 10^. The fits used for these ensemble tests were sim- 
plified, therefore the magnitude of the predicted uncer- 
tainty is expected to underestimate the final measured 
precision. However, the general trends should be valid. 

Based on these results, we choose the sample that con- 
tains about 6500 signal events, (labeled "Set 10" in Ta- 
ble HI as a final selection and refer to it as the "BDT 
selection" . Figure [T7] in Appendix |^ shows the ratios 
of the normalized distributions of the three angles (see 
Section IVI[) and the lifetime before and after the BDT 
selection. The ratios are consistent with unity, which 
means that the BDT requirements do not significantly 
alter these distributions. 



iJ/i"l>\ 
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IV-D. Simple Selection 

We select a second event sample by applying criteria 
on event quality and kinematic quantities. Wc use the 
consistency of the results obtained for the BDT and for 
this sample as a measure of systematic effects related to 
imperfect modeling of the detector acceptance and of the 
selection requirements. 

The criteria are the same as in Refs. [IJ and [2^. Each 
of the four tracks is required to have at least two SMT 
hits and at least eight hits in SMT or CFT. We require 
minimum momentum in the transverse plane pr for B^, 
(f), and K meson candidates of 6.0 GeV, 1.5 GeV, and 0.7 
GeV, respectively. Muons are required to have pt above 
1.5 GeV. For events in the central rapidity region (an 
event is considered to be central if the higher pt muon 
has I?? (/^leading) I < 1), WC require the transverse momen- 
tum of the J /il) meson to exceed 4 GeV. In addition, 
J / tjj candidates are accepted if the invariant mass of the 
muon pair is in the range 3.1 ± 0.2 GeV. Events are re- 
quired to satisfy the condition cr(t) < 0.2 ps where a{t) is 
the uncertainty on the decay proper time obtained from 
the propagation of the uncertainties in the decay- vertex 
kinematic fit, the primary vertex position, and the 
candidate transverse momentum. We refer to this sec- 
ond sample as the "Square-cuts" sample. 



number of candidates with d 7^ 0, is 18%. The OST- 
discriminatin g va riables and algorithm are described in 
detail in Ref. [H. 

The tagging dilution 2? is defined as 



V 



(2) 



where iVcor (-^wr) is the number of events with correctly 
(wrongly) identified initial B-meson flavor. The depen- 
dence of the tagging dilution on the tagging parameter d 
is calibrated with data for which the flavor {B or B) is 
known. 



V-A. OST calibration 



FLAVOR TAGGING 



At the Tevatron, b quarks are mostly produced in bb 
pairs. The flavor of the initial state of the candidate is 
determined by exploiting properties of particles produced 
by the other b hadron ("opposite-side tagging", or OST). 
The OST-discriminating variables Xi are based primarily 
on the presence of a muon or an electron from the semi- 
leptonic decay or the decay vertex charge of the other b 
hadron produced in the pp interaction. 

For the initial b quark, the probability density func- 
tion (PDF) for a given variable Xi is denoted as fi{xi), 
while for the initial b quark it is denoted as f^{xi). The 
combined tagging variable y is defined as: 



A given variable Xi can be undefined for some events. 
For example, there are events that don't contain an iden- 
tified muon from the opposite side. In this case, the cor- 
responding variable yi is set to 1. 

In this way the OST algorithm assigns to each event a 
value of the predicted tagging parameter d = (1 — y)/(l + 
y) in the range [—1,1], with d > tagged as an initial b 
quark and d < tagged as an initial b quark. Larger \d\ 
values correspond to higher tagging confidence. In events 
where no tagging information is available d is set to zero. 
The efficiency e of the OST, defined as fraction of the 



The dilution calibration is based on four independent 
— >■ fJ,vD*^ data samples corresponding to different 
time periods, denoted Ila, Ilbl, IIb2, and IlbS, with dif- 
ferent detector configurations and different distributions 
of instantaneous luminosity. The Run Ila sample was 
used in Ref . [s^ . 

For each sample we perform an analysis of the B^ — 
oscillations described in Ref. [13] . We divide the samples 
in five ranges of the tagging parameter and for each 
range we obtain a mean value of the dilution |X>|. The 
mixing frequency AMj^ is fitted simultaneously and is 
found to be stable and consistent with the world average 
value. The measured values of the tagging dilution 
for the four data samples above, in different ranges of 
are shown in Fig. E) The dependence of the dilution on 
\d\ is parametrized as 



Pa Po 

(1 + exp((pi - \d\)/p2)) (1 + exp(pi/p2)) ■ 



and the function is fitted to the data. All four mea- 
surements are in good agreement and hence a weighted 
average is taken. 



8 




Data: B" ^ n* D " (lla) 
Data: B° ^ (i* d'° (Ilb1 ) 
Data: B° ^ (i* D*° (Ilb2) 
Data: B° ^ n* D*° (Ilb3) 
Data: Weighted Ave. 
Fit to Data 
Uncertainty 



FIG. 4: (color online). Parametrization of the dilution j2?| 
as a function of the tagging parameter |d| for the combined 
opposite-side tagger. The curve is the result of the weighted 
fit to four self-tagging control data samples (see text). 



VI. 



MAXIMUM LIKELIHOOD FIT 



We perform a six-dimensional (6D) unbinned maxi- 
mum likelihood fit to the proper decay time and its uncer- 
tainty, three decay angles characterizing the final state, 
and the mass of the candidate. We use events for 
which the invariant mass of the K'^K' pair is within the 
range 1.01 - 1.03 GeV. There arc 104683 events in the 
BDT-based sample and 66455 events in the Square-cuts 
sample. We adopt the formulae and notation of Ref. [4l| . 
The normalized functional form of the differential decay 
rate includes an 5-wave KK contribution in addition to 
the dominant V-wave (p — > K~^K~ decay. To model the 
distributions of the signal and background we use the 
software library RooFiT [42j . 



VI- A. Signal model 

The angular distribution of the signal is expressed in 
the transversity basis [1^. In the coordinate system of 
the J/ip rest frame, where the (j) meson moves in the x 
direction, the z axis is perpendicular to the decay plane 
oi (j) ^ K~^K~ , and py{K^) > 0. The transversity polar 
and azimuthal angles 6 and ip describe the direction of 
the positively-charged muon, while ip is the angle between 
p{K'^) and —■p{J/ip) in the 4> rest frame. The angles are 
shown in Fig. [SJ 

In this basis, the decay amplitude of the 5° and 
mesons is decomposed into three independent compo- 
nents corresponding to linear polarization states of the 



J/\\f rest frame 




FIG. 5: (color online). Definition of the angle tp, and the 
transversity angles 9 and (p. 



vector mesons J/tJj and 0, which arc polarized cither lon- 
gitudinally (0) or transversely to their direction of mo- 
tion, and parallel (|j) or perpendicular (_L) to each other. 

The time dependence of amplitudes Ai(t) and Ai{t) {i 
denotes one of {||,_L,0}), for and B^ states to reach 
the final state J /ip (p is: 



Mt) = Fit) [E+{t) ± e^'^^E-it)] a, , 
Mt) = F(0 [±£+(t)+e-2'/^=i;_(0]a. 



where 



Fit) = 



\/rH + TL ± cos 2/3^ (tl - th) ' 



(4) 



(5) 



and th and tl are the lifetimes of the heavy and light 
B° eigenstates. 

In the above equations the upper sign indicates a CP- 
even final state, the lower sign indicates a CP-odd final 

state. 



E±{t) 



(6) 
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and the amplitude parameters give the time-integrated 
decay rate to each of the polarization states, la^p, satis- 
fying: 



(7) 



The interference terms Au — Ai_ and ^ are pro- 



portional to (e ^"^ 



-TLt 



) sin ( 



Also, if cos ( 



is significantly different from unity, the decay rates of the 
CP-even and CP-odd components have two slopes each. 

The normalized probability density functions Pb and 
Pg for B and B mesons in the variables t, cos-i/i, cos 9, 
and Lp, are 



iDTT 



(8) 



where n is the muon momentum direction in the J / rest 
frame, 



fi — (sin 9 cos if, sin 9 sin </?, cos 9) , 



(9) 



and A(i) and A{t) are complex vector functions of time 
defined as 



A{t) = \Ao{t) cos ip, " ^ \ i ^' ^ 



V2 ' V2 

T/^ IT,. , .4||(t)sinV' ^j_(t)sin'0\ , ^ 
A(t) = ( AWcosVa " ^ ,^ ^ ^ ) .(10) 



V2 



V2 



The values of Ai{t) at t = are denoted as Ai. They 
are related to the parameters Ui by 



\Ao\ 









n 








hi 




n 












n 







(11) 



where y = (1 - z) /(I -f z) and z = cos 2/3^ Ars/(2rs). By 
convention, the phase of is set to zero and the phases 
of the other two amplitudes are denoted by and S±. 

For a given event, the decay rate is the sum of the func- 
tions Pb and Pg weighted by the flavor tagging dilution 
factors (1 -I- 'D)/2 and (1 — 2?)/2, respectively. 

The contribution from the decay to J/ipK^ K~ with 
the kaons in an 5-wave is expressed in terms of the 5- 
wave fraction Fs and a phase ^s- The squared sum of 
the V and S waves is integrated over the KK mass. For 
the P-wave, we assume the non-rclativistic Breit-Wigner 
model 



]{M{KK)) 



AMiKK) M{KK) - 



ir^/2 
(12) 

with the (j) meson mass = 1.019 GeV and width = 
4.26 MeV [H, and with AM{KK) = 1.03 - 1.01 = 0.02 
GeV. 

For the iS-wave component, we assume a uniform dis- 
tribution in the range 1.01 < M{KK) < 1.03 GeV. We 
constrain the oscillation frequency to AMg = 17.77±0.12 
ps"-'^, as measured in Ref. [45|. Table HIl lists all physics 
parameters used in the fit. 



Definition 



Parameter 

pop P-wave longitudinal amplitude squared, at t = 

_Ai iA||iv(i-iAon 

Ta (ps) B^l mean lifetime 

AFs (ps^^) Heavy-light decay width difference 

Ps K^K~ 5-wave fraction 

CP-violating phase ( = -cjii^ ''"''/ 2) 
h arg(A||/ylo) 
<5± arg(AxMo) 
Ss &xg{As/AQ) 

TABLE II: Definition of nine real measurables for the decay 
— J/tp(j> used in the Maximmn Likefihood fitting. 

For the signal mass distribution we use a Gaussian 
function with a free mean value, width, and normal- 
ization. The function describing the signal rate in the 
6D space is invariant under the combined transforma- 



-AF,, 



'II 



27r 



tion 7r/2 - Ps, ATs - 

S± — > TT — (Sj_, and 6s tt — Ss- In addition, with a limited 
flavor-tagging power, there is an approximate symmetry 
around /3s = for a given sign of AFs. 

We correct the signal decay rate by a detector accep- 
tance factor e(ip, 9, tp) parametrized by coefficients of ex- 
pansion in Legendrc polynomials Pk{'4') a-nd real harmon- 
ics Yim{9,(p)- The coefficients are obtained from Monte 
Carlo simulated samples, as described in Appendix IB] 

The signal decay time resolution is given by a Gaus- 
sian centered at zero and width given by the product of 
a global scale factor and the event-by-event uncertainty 
in the decay time measurement. The distribution of the 
uncertainty in the decay time measurement in the MC 
simulation is modeled by a superposition of five Gaus- 
sian functions. The background-subtracted signal distri- 
bution agrees well with the MC model, as seen in Fig. [HI 
Variations of the parameters within one sigma of the best 
fit are used to define two additional functions, also shown 
in the figure, that are used in alternative fits to estimate 
the systematic effect due to time resolution. 



VI-B. Background model 

The proper decay time distribution of the background 
is described by a sum of a prompt component, modeled 
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a(t) (ps) 

FIG. 6: (color online). The distribution of the uncertainty in 
the decay time for the signal, MC (squares) and background- 
subtracted data (crosses). The blue curve is the sum of five 
Gaussian functions fitted to the MC distribution. The two 
red lines are variations of the default function used in the 
studies of systematic effects. 



ground. A preliminary fit is first performed with all 17 

parameters c^^\,n for prompt and 17 parameters c^'^^\rn 
for non-prompt allowed to vary. In subsequent fits those 
that converge at values within two standard deviations 
of zero are set to zero. Nine free parameters remain, five 
for non-prompt background: c^^^\o^ '^^^^\2^ 

c^^^-'qq, and c'-^^^22j ^-^id four for prompt background: 

c^^\_i, ^^^^20' '^^^''22' ^iid c^^\_i- All background 
parameters described above are varied simultaneously 
with physics parameters. In total, there are 36 param- 
eters used in the fit. In addition to the nine physics 
parameters defined in Table |lTl they are: signal yield, 
mean mass and width, non-prompt background contri- 
bution, six non-prompt background lifetime parameters, 
four background time resolution parameters, one time 
resolution scale factor, three background mass distribu- 
tion parameters, and nine parameters describing back- 
ground angular distributions. 

VI-C. Fit results 



as the same resolution function used in the signal decay 
time, and a non-prompt component. The non-prompt 
component is modeled as a superposition of one expo- 
nential decay for t < and two exponential decays for 
< > 0, with free slopes and normalizations. The lifetime 
resolution is modeled by an exponential convoluted with 
a Gaussian function, with two separate parameters for 
prompt and non-prompt background. To allow for the 
possibility of the lifetime uncertainty to be systemati- 
cally underestimated, we introduce a free scale factor. 

The mass distributions of the two components of back- 
ground are parametrized by low-order polynomials: a lin- 
ear function for the prompt background and a quadratic 
function for the non-prompt background. The angular 
distribution of background is parametrized by Legendre 
and real harmonics expansion coefficients. A separate set 
of expansion coefficients and cf„ , with /c = or 2 and 
I = 0, 1, 2, is used for the prompt and non-prompt back- 



Thc maximum likelihood fit results for the nominal fit 
(Default), for two alternative time resolution functions, 
cryi(i) and (Tsit) shown in Fig. [SI and for an alterna- 
tive M{KK) dependence of the (t>{1020) K+K~ de- 
cay with the decay width increased by a factor of two are 
shown in Table IIIII and Table IIVI These alternative fits 
are used to estimate the systematic uncertainties. The 
fit assigns 5598 ± 113 (5050 ± 105) events to the signal 
for the BDT (Square-cuts) sample. Only the parameters 
whose values do not suffer from multi-modal effects are 
shown. A single fit does not provide meaningful point 
estimates and uncertainties for the four phase param- 
eters. Their estimates are obtained using the MCMC 
technique. Figures [7] - [TU] illustrate the quality of the 
fit for the background, for all data, and for the signal- 
enhanced sub-samples. 

An independent measurement of the 5- wave fraction is 
described in Appendix [C] and the result is in agreement 
with Fs determined from the maximum likelihood fit. 



Parameter Default aAjt) aB{t) = 8.52 MeV 

\Aq^ 0.553 ±0.016 0.553 ±0.016 0.552 ± 0.016 0.553 ± 0.016 

I V(l - l^ol^) 0.487 ±0.043 0.483 ± 0.043 0.485 ± 0.043 0.487±0.043 

(ps) 1.417 ±0.038 1.420 ±0.037 1.417 ±0.037 1.408 ±0.434 

AFs (ps-^) 0.151 ±0.058 0.136 ±0.056 0.145 ±0.057 0.170 ±0.067 

Fs 0.147 ±0.035 0.149 ±0.034 0.147 ±0.035 0.147 ±0.035 



TABLE III: Maximum likelihood fit results for the BDT selection. The uncertainties are statistical. 
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Parameter Default aA^t) crs(t) = 8.52 MeV 

0.566 ±0.017 0.564 ±0.017 0.567 ±0.017 0.566 ± 0.017 

1^11 1V(1 - 0.579 ±0.048 0.579 ±0.048 0.577 ±0.048 0.579 ± 0.048 

(ps) 1.439 ±0.039 1.450 ± 0.038 1.457 ±0.037 1.438 ±0.042 

Ar^ (ps-i) 0.199 ±0.058 0.194 ±0.057 0.185 ±0.056 0.202 ± 0.060 

Fs 0.175 ±0.035 0.169 ±0.035 0.171 ±0.035 0.175 ±0.035 



TABLE IV: Maximum likelihood fit results for the 'Square-cuts' sample. 




FIG. 7: (color online). The distributions in the background {B^ mass sidebands) region of candidate mass, proper decay time, 
decay time uncertainty, transversity polar and azimuthal angles, and cosip for the BDT sample. The curves show the prompt 
(black dashed) and non-prompt (red dotted) components, and their sum (blue solid). 



VI-D. Systematic uncertainties 



ble. 



There are several possible sources of systematic un- 
certainty in the measurements. These uncertainties are 
estimated as described below. 

• Flavor tagging: The measured flavor mistag frac- 
tion suffers from uncertainties due to the limited 
number of events in the data samples for the decay 
— >■ ^vD'^*^^. The nominal calibration of the 
flavor tagging dilution is determined as a weighted 
average of four samples separated by the running 
period. As an alternative, we use two separate cal- 
ibration parameters, one for the data collected in 
running periods Ila and Ilbl, and one for the IIb2 
and IIb3 data. We also alter the nominal param- 
eters by their uncertainties. We find the effects of 
the changes to the flavor mistag variation negligi- 



• Proper decay time resolution: Fit results 
can be affected by the uncertainty of the as- 
sumed proper decay time resolution function. To 
assess the effect, we have used two alternative 
parametrizations obtained by random sampling of 
the resolution function. 

• Detector acceptance: The effects of imperfect 
modeling of the detector acceptance and of the se- 
lection requirements are estimated by investigat- 
ing the consistency of the fit results for the sample 
based on the BDT selection and on the Square-cuts 
selection. Although the overlap between the two 
samples is 70%, and some statistical differences are 
expected, we interpret the differences in the results 
as a measure of systematic effects. 
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FIG. 8: (color online). Invariant mass, proper decay time, and proper decay time uncertainty distributions for B" candidates in 
the (top) BDT sample and (bottom) Square-cuts sample. The curves are projections of the maximum likelihood fit. Shown are 
the signal (green dashed-dotted curve), prompt background (black dashed curve), non-prompt background (red dotted curve), 
total background (brown long-dashed curve), and the sum of signal and total background (solid blue curve). 



The two event selection approaches have different 
merits. The BDT-based approach uses more infor- 
mation on each event, and hence it ahows a higher 
signal yield at lower background. However, it ac- 
cepts signal events of lower quality (large vertex 
or proper decay time uncertainty) that are rejected 
by the square cuts. Also, the BDT-based approach 
uses the AI [KK) distribution as a discriminant in 
the event selection, affecting the results for the pa- 
rameters entering the S ~V interference term, par- 
ticularly the 5- wave fraction Fs and the phase pa- 
rameters. 

The main difference between the two samples is in 
the kinematic ranges of final-state kaons, and so 
the angular acceptance functions and MC weights 
(see Appendix [B]) are different for the two sam- 
ples. Imperfections in the modelling of the de- 
cay kinematics and estimated acceptances, and in 
the treatment of the MC weighting, are reflected 
in differences between fit results. The differences 
are used as an estimate of this class of systematic 
uncertainty. 

• M{KK) resolution: The limited M{KK) res- 
olution may affect the results of the analysis, es- 
pecially the phases and the iS-wave fraction Fs, 
through the dependence of the S — V interference 
term on the 7^-wave mass model. In principle. 



the function of Eq. should be replaced by a 
Breit-Wigner function convoluted with a Gaussian. 
We avoid this complication by approximating the 
smeared P-wavc amplitude by a Brcit-Wigner func- 
tion where the width of Eq. ((T2|) is set to twice 
the world average value to account for the detector 
resolution effects. A MC simulation-based estimate 
of the scale factor for the event selection criteria 
used in this analysis yields a value in the range 1.5 
- 1.7. The resulting complex integral of the S — V 
interference has an absolute value behavior closer 
to the data, but a distorted ratio of the real and 
imaginary parts compared to Eq. (|12p . We repeat 
the fits using this altered (/)(1020) propagator as a 
measure of the sensitivity to the M{KK) resolu- 
tion. 

Tablcs HlIl and irvl compare results for the default fit and 
the alternative fits discussed above. The differences be- 
tween the best-fit values provide a measure of systematic 
effects. For the best estimate of the credible intervals for 
all the measured physics quantities, we conduct MCMC 
studies described in the next section. 

Other sources of systematic uncertainties like the func- 
tional model of the background mass, lifetime and angle 
distributions were studied and give a negligible contribu- 
tion. 
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FIG. 9: (color online). Distributions of transversity polar and azimuthal angles and cos ip for candidates in the BDT sample 
(top) and Square-cuts sample (bottom). The curves are projections of the maximum likelihood fit. Shown are the signal (green 
dashed-dotted curve), total background (brown long-dashed curve) and the sum of signal and total background (blue solid 
curve). 



VII. BAYESIAN CREDIBILITY INTERVALS 
FROM MCMC STUDIES 



VII-A. The method 



The maximum likelihood fit provides the best values 
of all free parameters, including the signal observablcs 
and background model parameters, their statistical un- 
certainties and their full correlation matrix. 

In addition to the free parameters determined in the 
fit, the model depends on a number of external constants 
whose inherent uncertainties are not taken into account 
in a given fit. Ideally, effects of uncertainties of external 
constants, such as time resolution parameters, flavor tag- 
ging dilution calibration, or detector acceptance, should 
be included in the model by introducing the appropriate 
parametrized probability density functions and allowing 
the parameters to vary. Such a procedure of maximizing 
the likelihood function over the external parameter space 
would greatly increase the number of free parameters and 
would be prohibitive. Therefore, as a trade-off, we apply 
a random sampling of external parameter values within 
their uncertainties, we perform the analysis for thus cre- 
ated "alternative universes" , and we average the results. 
To do the averaging in the multidimensional space, tak- 
ing into account non-Gaussian parameter distributions 
and correlations, we use the MCMC technique. 



The MCMC technique uses the Metropolis-Hastings 
algorithm |44| to generate a sample representative to a 
given probability distribution. The algorithm generates 
a sequence of "states", a Markov chain, in which each 
state depends only on the previous state. 

To generate a Markov chain for a given data sample, 
we start from the best-flt point x. We randomly generate 
a point x' in the parameter space according to the multi- 
variate normal distribution exp(— (x' — x)-Ti- {x' — x)/2), 
where E is the covariance matrix between the best fit 
current point x in the chain and next random point x' . 
The best-fit point and the covariance matrix are obtained 
from a maximum likelihood fit over the same data sam- 
ple. The new point is accepted if C{x')/ C{x) > 1, other- 
wise it is accepted with the probability C{x')/ C{x). The 
process is continued until a desired number of states is 
achieved. To avoid a bias due to the choice of the initial 
state, we discard the early states which may "remember" 
the initial state. Our studies show that the initial state 
is "forgotten" after approximately 50 steps. We discard 
the first 100 states in each chain. 
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FIG. 10: (color online). Distributions of transversity polar and azimuthal angles and cos'i/' for candidates in the BDT 
sample (top) and Square-cuts sample (bottom). The signal contribution is enhanced, relative to the distributions shown in 
Fig. [9] by additional requirements on the reconstructed mass of the B'i candidates (5.31 < M{B°) < 5.43 GeV) and on the 
proper time t > 1.0 ps. The curves are projections of the maximum likelihood fit. Shown are the signal (green dashed-dotted 
curve), total background (brown long-dashed curve) and the sum of signal and total background (blue solid curve). 



VII-B. General properties of MCMC chains for the 
BDT-selection and Square-cuts samples 

We generate 8 MCMC chains, each containing one mil- 
lion states: a nominal and three alternative chains each 
for the BDT-selection and Square-cuts samples, accord- 
ing to the fit results presented in Tables IIIII and IIVI 

Figures E] and [12] illustrate the dependence of 4>i^^'^ 



on other physics parameters, in particular on cos and 
cosfSg. Each point shows the Markov Chain represen- 
tation of the likelihood function integrated over all pa- 
rameters except the parameter of interest in a slice of 
(pi^^'^ . For clarity, the profiles are shown for ATg > and 
ATg < separately. The distributions for the Square- 
cuts sample are similar. We note the following salient 
features of these correlations for AT^ > 0: 
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FIG. 11: Profiles of AM^, Ts, AFs, cosS±, cosSs, and Fs, for AFs > 0, versus (jii^"'"'' from the MCMC simulation for the BDT 
selection data sample. 



a) A positive correlation between <j)s and AMs, 
with the best fit of (j)'s^^'^ changing sign as AMg 
increases (see also Fig. [26] in Appendix [P]) . 

b) A correlation between and Ts, with the 
highest Ts occuring at 4>i^^'^ = 0. 

c) For near zero, lAFsj increases with 

d) A strong positive correlation between (j)'s^^'^ and 
cos S± near (pi^^'^ = 0, with (pi^^'^ changing sign 
as the average cos5j_ increases between —0.8 and 
+0.8. For the related decay ^ J/^K* the mea- 
sured value is cos^j^ = —0.97. This indicates that 
a constraint of cos(5j^ to the Bj] — )• J/'ipK* value 

would result in (pi^'^'^ < with a smaller uncer- 
tainty. 

e) A strong positive correlation between (pi^'^'^ and 
cos (5s near (pi^'^'^ = 0, with (pi^'^'^ changing sign 
as the average cos 5s increases between —0.4 and 
-H0.4. 



f) A weak correlation between 0s and Fs, with Fs 
a few percent lower for cpi^^"^ < 0. 

While we do not use any external numerical constraints 
on the polarization amplitudes, we note that the best- 
fit values of their magnitudes and phases are consistent 
with those measured in the t/(3)-flavor related decay 
-!■ J/^K* [Hi, up to the sign ambiguities. Ref. ^ 
predicts that the phases of the polarization amplitudes 
in the two decay processes should agree within approx- 
imately 0.17 radians. For S±, our measurement gives 
equivalent solutions near tt and near zero, with only the 
former being in agreement with the value of 2.91 ± 0.06 
measured for — >■ J/ipK* by B factories. Therefore, in 
the following we limit the range of S± to cos 6± < 0. 

To obtain the credible intervals for physics parameters, 
taking into account non-Gaussian tails and systematic 
effects, we combine the MCMC chains for the nominal 
and alternative fits. This is equivalent to an effective 
averaging of the resulting probability density functions 
from the fits. First, we combine the four MCMC chains 
for each sample. We then combine all eight chains, to 
produce the final result. 
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FIG. 12: Profiles of AM^, r^, AF^, cosS±, cos Ss, and Fs, for AF^ < 0, versus 0^ from the MCMC simulation for the BDT 
selection data sample. 



VII-C. Results 



Figure [13] shows 68%, 90% and 95% credible regions 
ATs) plane for the BDT-based and for the 



in the 



Square-cuts samples. The point estimates of physics pa- 
rameters are obtained from one-dimensional projections. 
The minimal range containing 68% of the area of the 
probability density function defines the one standard de- 
viation credible interval for each parameter, while the 
most probable value defines the central value. 



The large correlation coefficient (0.85) between the two 
phases, S± and ds, prevents us from making separate 
point estimates. Their individual errors are much larger 
than the uncertainty on their difference. For the BDT 
selection, the measured 5- wave fraction F5(eff) is an ef- 
fective fraction of the K'^K~ 5-wave in the accepted 
sample, in the mass range 1.01 < M(X+A'~) < 1.03 
GeV. It includes the effect of the diminished acceptance 
for the 5-wave with respect to the "P-wave in the event 
selection. 



Ts 

AF, 

cos{5± - 5s) 
FsicS) 



1.426t° °35 ps, 



-0.032 
F0.076 - 
-0.053 



0.129t° °^6 



n 40+0-48 
U.OOZ_Q 017, 

219+° °20 

U.Zi J_o 02lJ 

3.15 ±0.27, 
-0.06 ±0.24, 
0.146 ± 0.035. 



F5(eff) in this case refers to the "effective" Fs since it 
is not a physical parameter: the BDT cut on the phi mass 
leads to the measurement of Fs in this case to depend 
on the efficiency of the selection to non-resonant — > 
J/ijK+K-. 



This procedure gives the following results for the BDT- 
based sample: 



The one-dimensional estimates of physics parameters 
for the Square-cuts sample are: 
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FIG. 14: (color online). Two-dimensional 68%, 90% and 95% 
credible regions including systematic uncertainties. The stan- 
dard model expectation is indicated as a point with an error. 
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FIG. 13: (color online). Two-dimensional 68%, 90% and and 
95% credible regions for (a) the BDT selection and (b) the 
Square-cuts sample. The standard model expectation is indi- 
cated as a point with an error. 



VIII. SUMMARY AND DISCUSSION 

We have presented a time-dependent angular analysis 
of the decay process B'^ — J/tpcj). We measure mixing 
parameters, average lifetime, and decay amplitudes. In 
addition, we measure the amplitudes and phases of the 
polarization amplitudes. We also measure the level of 
the KK 5- wave contamination in the mass range (1.01 - 
1.03) GcV, Fs- The measured values and the 68% credi- 
ble intervals, including systematic uncertainties, with the 
oscillation frequency constrained to AMg = 17.77 ± 0.12 
ps~^, are: 



AT, 
\Ao\' 



^443+0.038 



0.035 P^' 

0.065 - 
0.064 P° 

0.38 
0.36' 



0,163+0.065 



-0.55+°-3« 



^^8+°-°i'^ 
u.ooo_o.oi9j 

n 90-1 +0.024 

U.ZOi_Q Q3Q, 



Sn = 



cos{S± — 5s) 
Fs 



3.15 ±0.22, 

^•-'^-'^-0.25- 

0.173 ±0.036, 



(13) 



To obtain the final credible intervals for physics pa- 
rameters, wc combine all eight MCMC chains, effectively 
averaging the probability density functions of the results 
of the fits to the BDT- and Square-cuts samples. Fig- 
ure [T3] shows 68%, 90% and 95% credible regions in the 
((/)s^'^'^,Ar^) plane. The p-value for the SM point [13 

{4>i^'*''',ATs) = (-0.038,0.087 ps^^) is 29.8%. The 
onc-dimcnsional 68% credible intervals are listed in Sec- 
tion Em] below. 



The p-value for the SM point (0^^'^'^, AT,) = 
(-0.038,0.087 ps-i) is 29.8%. 

In the previous publication (26j . which was based on 
a subset of this data sample, we constrained the strong 
phases to those of — )■ J/i/jK* whereas this analysis 
has a large enough data sample to reliably let them 
float. Also, the previous publication did not have a large 
enough data sample to allow for the measurement of a 
significant level of KK iS-wave, whereas it is measured 
together with its relative phase in the current analysis. 
The results supersede our previous measurements. 
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Independently of the Maximum Likelihood analysis, 
we make an estimate of the non-resonant K^K~ in the 
final state based on the M{KK) distribution of the 
signal yield. The result of this study (Appendix [C]) is 
consistent with the result of the Maximum Likelihood fit 
shown above. 
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Appendix A: BDT Discriminants 

Two BDT discriminants are used to reject background. 
One is trained to remove the prompt background (the 
"prompt BDT" ) , and the other is trained to remove inclu- 
sive B decays (the "inclusive BDT"). The prompt BDT 
uses 33 variables, listed in Table |Vl The inclusive BDT 
uses 35 variables, list ed in Table IVll In these tables, Ai? 
is defined as Ai? = i^fl)^ + {^4')^, where i] is the pseu- 
dorapidity and (j) is the azimuthal angle. The term "un- 
corrected" refers to the correction due to the J/ip mass 
constraint. "Leading" ("trailing") muon or kaon refers to 
the particle with larger (smaller) px, and dE/dx is the 
energy loss per unit path length of a charged particle as 
it traverses the silicon detector. Isolation is defined as 
Pi^)/J2<ARP where p{B) is the sum of the momenta of 
the four daughter particles of the 5° candidate, and the 
sum is over all particles within a cone defined by Ai?, 
including the decay products of the candidate. The 
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tables also show the importance and separation for each 
variable. The separation (5^) of a classifier y is defined 



iysjy) - yB{y)f 
ys{y) + yB{y) 



dy, 



(Ai) 



where ys is the output of the discriminant function for 
signal events and is the discriminant function for back- 
ground. The importance of each BDT input variable is 
derived by counting in the training how often the vari- 
able is used to split decision tree nodes and by weighting 
each split occurrence by its separation gain squared and 



by the number of events in the node. 

The distributions for the six most important variables 
in training on prompt J/ ip decays are shown in Fig. 1151 
The distributions for the six most important variables in 
the training on inclusive B — >■ J/ipX decays are shown 
in Fig. [TH 

Figure [T7] compares the shapes of the distributions of 
the three angular variables and the lifetime, before and 
after the BDT requirements. The figures show that the 
BDT requirements do not affect these differential distri- 
butions significantly. 



Rank 


Variable 


Importance Separation 


1 


KK invariant mass 


0.3655 


0.3540 


2 


Maximum Ai? between either K meson and the candidate 


0.1346 


0.4863 


3 


Isolation using the maximum A_R between either K and the 


0.0390 


0.1784 


4 


Uncorrected pr of the B^ 


0.0346 


0.3626 


5 


Minimum A_R between either K and the B^ 


0.0335 


0.4278 


6 


Pt of the trailing K meson 


0.0331 


0.4854 


7 


Pt of the 4> meson 


0.0314 


0.4998 


8 


Pt of the leading K meson 


0.0283 


0.4884 


9 


Trailing muon momentum 


0.0252 


0.0809 


10 


Pt of the leading muon 


0.0240 


0.1601 


11 


Maximum Ai? between either muon and the B" 


0.0223 


0.1109 


12 


Maximum of either K meson with the J/jp vertex 


0.0217 


0.0162 


13 


Dimuon invariant mass 


0.0215 


0.0145 


14 


Maximum of either of the K candidate track 


0.0213 


0.021 


15 


Bg isolation using the larger K/Bs Ai? and tracks from the PV 


0.0207 


0.1739 


16 


Pt of the J /i}) meson 


0.0205 


0.1809 


17 


Minimum Ai? between either muon and the B^ candidate 


0.0188 


0.1023 


18 


Trailing K momentum 


0.0105 


0.3159 


19 


"X^ of the i?s candidate vertex 


0.0093 


0.0119 


20 


B^ isolation using Ai? < 0.75 


0.0084 


0.0241 


21 


Minimum x'^ of the J/xp vertex with either K 


0.0081 


0.0069 


22 


Pt of the trailing muon 


0.0079 


0.0922 


23 


Minimum of the x^ of the J/tp and vertices 


0.0073 


0.0057 


24 


Isolation using Ai? < 0.5 


0.0070 


0.0405 


25 


Uncorrected B^ total momentum 


0.0068 


0.2103 


26 


Minimum x'^ of either K track fit 


0.0065 


0.0266 


27 


Isolation using Ai? < 0.5 and particles from the PV 


0.0057 


0.0401 


28 


Leading K meson momentum 


0.0051 


0.3217 


29 


Leading muon momentum 


0.0048 


0.0908 


30 


(j) meson momentum 


0.0048 


0.3233 


31 


Maximum x^ of the J/ip or 4> vertices 


0.0044 


0.0061 


32 


Isolation using Ai? < 0.75 and particles from the PV 


0.0037 


0.0259 


33 


J/ip meson momentum 


0.0037 


0.1004 



TABLE V: Variables used to train the prompt BDT, ranked by their importance in the training. 
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Rank 


Variable 


Importance Separation 


1 


KK invariant mass 


0.2863 


0.3603 


2 


isolation using the larger K/Bg AR and tracks from the PV 


0.1742 


0.4511 


3 


Minimum dE/dx of either K 
of 5° 


0.0778 


0.1076 


4 


0.0757 


0.2123 


5 


Pt of the (j> meson 


0.0559 


0.4856 


6 


Pt of the leading K meson 


0.0504 


0.4745 


7 


Isolation using the maximum AR between either K and the 


0.0429 


0.4468 


8 


Pt of the trailing K meson 


0.0350 


0.4774 


9 


Maximum of either K meson with the J/ip vertex 


0.0260 


0.2051 


10 


Isolation using A_R < 0.5 and particles from the PV 


0.0229 


0.1703 


11 


Isolation using A_R < 0.75 and tracks from the PV 


0.0154 


0.2238 


12 


Minimum of of either K with the J/ip vertex 


0.0151 


0.1308 


13 


Minimum AR between either K meson and the B° candidate 


0.0115 


0.3104 


14 


Dimuon invariant mass 


0.0099 


0.0190 


15 


Total momentum of the meson 


0.0091 


0.3307 


16 


PT of the J/tl) meson 


0.0089 


0.1198 


17 


Trailing muon momentum 


0.0082 


0.0594 


18 


Isolation using AR < 0.5 


0.0073 


0.1695 


19 


Maximum AR between either K meson and the B^ candidate 


0.0070 


0.3794 


20 


Maximum dE/dx of either K meson 


0.0069 


0.0528 


21 


Trailing K meson momentum 


0.0068 


0.3253 


22 


J vertex 


0.0063 


0.0057 


23 


Leading K meson momentum 


0.0058 


0.3277 


24 


Maximum of either K candidate track 


0.0054 


0.0267 


25 


Isolation using A_R < 0.75 


0.0046 


0.2203 


26 


Minimum AR between either muon and the B^ candidate 


0.0041 


0.0729 


27 


Minimum x^ of either K candidate track 


0.0039 


0.0284 


28 


uncorrected pT of B^ candidate 


0.0036 


0.2485 


29 


Pt of the trailing muon 


0.0029 


0.0702 


30 


J/i/; momentum 


0.0027 


0.0645 


31 


Maximum AR between either muon and the B^ candidate 


0.0026 


0.0872 


32 


Vertex x^ of the (j) meson 


0.0017 


0.0098 


33 


Uncorrected B^ momentum 


0.0014 


0.1675 


34 


Pt of the leading muon 


0.0011 


0.1008 


35 


Leading muon momentum 


0.0009 


0.0547 



TABLE VI: Variables used to train the non-prompt BDT, ranked by their importance in the training. 
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FIG. 15: (color online) The distributions of the six most important variables used in the BDT trained on prompt J/tl^ production 
for the — >■ J/''P4' signal (solid blue) and prompt J/ip events (red dashed) histograms. 
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FIG. 16: (color online) The distributions of the six most important variables used in the BDT trained on inclusive B — >■ J /^X 
decays for the B^l — s- J/ip4> signal (solid blue) and inclusive B — >■ J/tjjX decays (red dashed) histograms. 
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FIG. 17: Test of uniformity of the efficiencies of the BDT selection using a MC sample with 4>s — —0.5. The figure shows the 
ratios of the normalized distributions of (a - c) the three angles and (d) the proper decay length, before and after the BDT 
selection. 
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Appendix B: Detector acceptance 

We take into account the shaping of the signal distribu- 
tion by the detector acceptance and kinematic selection 
by introducing acceptance functions in the three angles 
of the transversity basis . The acceptance functions are 
derived from Monte Carlo simulation. Due to the event 
triggering effects, the momentum spectra of final-state 
objects in data are harder than in MC. Wc take into 
account the difference in the pt distribution of the final- 
state objects in data and MC by introducing a weight 
factor as a function oi pt{J separately for the cen- 
tral (|?7(/iioading)| < 1) and forward regions. The weight 
factor is derived by forcing an agreement between the 
J/'0 transverse momentum spectra in data and MC. The 
behavior of the weight factor as a function of pt{J/'4') 
for the BDT-based selection, for the central and forward 
regions, is shown in Fig. [181 

Figure \W\ shows the background-subtracted px distri- 
butions of the leading and trailing muon and leading and 
trailing kaon, in the central region. There is a good agree- 
ment between data and MC for all final-state particles 



after applying the weight factor. The acceptance in if 
and 9 is shown in Fig. [201 The acceptance in tp is shown 
in Fig. 
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FIG. 18: Weight factor as a function of pT{J/tp) used to 
correct MC pr distribution of Bg and decay objects for 
(a) central region, and (b) forward region. The curves are 
empirical fits to a sum of a Landau function and a polynomial. 
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FIG. 19: Transverse momentum distributions of the four final-state particles in data (points) and weighted MC (solid histogram), 
for the BDT-based event selection. 
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FIG. 20: (color online). Map of the detector acceptance on the plane ip - cos 6'. 
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FIG. 21: Detector acceptance as a function of cos^. The acceptance is uniform in cosi/). 



Appendix C: Independent estimate of Fs 



In the Maximum Likelihood fit, the invariant mass of 
the K~^K~ pair is not used. To do so would require 
a good model of the M{K^ K~) dependence of back- 



ground, including a small (/)(1020) component, as a func- 
tion of the B'^ candidate mass and proper time. However, 
we can use the M^K'^ K~) mass information to make an 
independent estimate of the non-resonant K^K" contri- 
bution in the final state. 

For this study, we use the "Square-cuts" sample, for 
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which the event selection is not biased in M{K~^ K~). 
Using events with decay length ct > 0.02 cm to sup- 
press background, we extract the i?^ signal in two ranges 
of M{K+K-): 1.01 < M{KK) < 1.03 GeV and 
1.03 < M{KK) < 1.05 GeV. The first range is that 
used by both selections, and contains the bulk of the 
(j) — )■ K^K~ signal. The second range will still contain a 
small Breit-Wigner tail of — )■ K^K~ . From the simu- 
lated M{K+K-) distribution of the B° J/iptj) decay, 
shown in Fig.[52J we obtain the fraction of the K~^K~ de- 
cay products in the upper mass range to be 0.061 ±0.001 
of the total range 1.01 < M{KK) < 1.05 GeV. The 5- 
wave component is assumed to be a flat distribution in 
M{KK) across this range. Given that the widths of the 
ranges are the same, the number of candidates due to the 
iS-wave contribution should be the same for both. 

The B1 signal in each mass range is extracted by fit- 
ting the candidate mass distribution to a Gaussian 



function representing the signal, a linear function for the 
background, and MC simulation-based templates for the 
J/ijjK* reflection where the pion from the K* de- 
cay is assumed to be a kaon. The two shape templates 
used, one for each mass range, are shown in Fig.[23l The 
mass distributions, with fits using the above templates, 
are shown in Fig. [24l The fits result in the yield 
of 3027 ± 93 events for 1.01 < M{KK) < 1.03 GeV 
and 547 ± 94 events for 1.03 < M{KK) < 1.05 GeV. In 
the mass range 1.01 < M{KK) < 1.03 GeV, we extract 
the fraction of B'^ candidates decaying into non-resonant 
KK to be 0.12 ± 0.03. The error includes the uncertain- 
ties in the signal and background modelling. This excess 
may be due to an 5-wave, or a non-resonant P-wave, or 
a combination of both. If we assign it entirely to the 
iS-wave, and assume it to be independent of M{KK), 
we obtain the measured iS-wavc fraction in the range 
1.01 < M{K+K-) < 1.03 GeV to be Fs = 0.12 ± 0.03. 
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FIG. 22: Invariant mass distribution of kaon pairs from the full simulation of the decay 
delineate the two M{KK) invariant mass bins considered. 
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Appendix D: — oscillation 

Under the hypothesis of CP conservation in the B^ 
decay, and a possible mixing-induced CP violation, the 
non- vanishing CP- violating mixing angle should manifest 
itself B^ — Bg oscillation with the amplitude pro- 
portional to sin((/)f^''"^). The observed time-dependent 

asymmetry A7V = 7V(S0) - N{^,) = Ng-C- sin(0s^'^'^), 
is diluted by a product C of several factors: (i) a factor 
oi{l-2\A±\'^)-{l-2Fs) « 0.6 -0.7 due to the presence of 
the CP-odd decay, (ii) a factor of e-V^ w 0.03 due to the 
flavor tagging efficiency and accuracy, and (iii) a factor 
of exp(— (AAfsCr)^/2) « 0.2 due to the limited time res- 
olution. Thus, with Ns w 6000 events, and C « 0.0025, 
we expect Ns • C « 15. 

In Fig. [2n]we show the proper decay length evolution of 



A7V in the first 90 /im, corresponding to approximately 
twice the mean B^ lifetime. The curve represents a fit 
to the function A^o ■ sm{AMst) ■ exp(— t/T^), with A'o un- 
constrained and with AMg = 17.77 ps^^. The fit gives 
A'o = —6 for the BDT-based sample and —8 for the 
Square-cuts sample, with a statistical uncertainty of ±4, 
corresponding to sin((/)s/'^'^) = Nq/Ns ■ C -0.4 ± 0.3. 

This one-dimensional analysis gives a result for 
that is consistent with the result of the full analysis. 

Following the Amplitude Method described in 
Ref. [l^ , we fit the above distributions at discrete values 
of AMs , and plot the fitted value of A^o as a function of 
the probe frequency. The results are shown in Fig. [26l 
There is an undulating structure, with no significantly 
large deviations from zero. At AMg near 17.77 ps~^ the 
data prefer a negative oscillation amplitude (and hence a 
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FIG. 23: The simulated distributions of the invariant mass of the — ^ J/ipK* decay products reconstructed under the 
B° -> J/'tp(l> hypothesis for 1.01 < M{KK) < 1.03 GeV (left) and 1.03 < M{KK) < 1.05 GeV (right). The curves are results 
of fits assuming a sum of two Gaussian functions. 
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FIG. 24; (color online). Invariant mass distributions of Bg candidates with decay length ct > 0.02 cm for 1.01 < M{KK) < 1.03 
GeV (left) and 1.03 < M{KK) < 1.05 GeV (right). Fits to a sum (black line) of a Gaussian function representing the signal 



(red), an MC simulation-based template for the B° 
background are used to extract the 5° yield. 



J/xpK* reflection (green line), and a linear function representing the 



negative value of smcfis ). The statistical uncertainty the time resolution, CP-odd fraction, and the 5-wave 
of the result of this simple approach does not take into fraction, 
account uncertainties of the dilution factors, related to 
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FIG. 25: Proper decay length evolution of the difference 
AiV = N{B°) - N(B°) in the first 0.09 cm (3 ps) for the 
Square-cuts sample. The curve represents the best fit to the 
oscillation with the frequency of AMs — 17.77 ps^^. 
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FIG. 26: (color online). The fitted magnitude of the B° — b'I oscillation as a function of AMs for (a) BDT selection and (b) 
Square cuts. The red crosses correspond to AMs = 17.77 ps~^. 



